Isolated and Connected Word Recognition-Theory and Selected Applications
ثبت نشده
چکیده
The art and science of speech recognition have been advanced to the state where it is now possible to communicate reliably with a computer by speaking to it in a disciplined manner using a vocabulary of moderate size. It is the purpose of this paper to outline two aspects of speech-recognition research. First, we discuss word recognition as a classical pattern-recognition problem and show how some fundamental concepts of signal processing, information theory, and computer science can he combined to give us the capability of robust recognition of isolated words and simple connected word sequences. We then describe methods whereby these principles, augmented by modern theories of formal anguage and semantic analysis, can be used to study some of the more general problems in speech recognition. It is anticipated that these methods will ultimately lead to accurate mechanical recognition of fluent speech under certain controlled conditions. I I . INTRODUCTION A LTHOUGH a great deal has been learned about the fundamental processes of speech production and speech perception, the goal of mechanical recognition of fluent speech remains elusive [ l ] -[5] . Speech recognition, however, has made major strides forward in the past decade, and it has advanced to the point where several commercial systems are currently available [6] -[ 111 . These commercial systems are predominantly isolated word, speaker-trained systems which achieve word accuracies greater than 95 percent in noisy environments. At least one system, however, is a speaker-independent, isolated word recognizer operating over dialed-up telephone lines [7] , while another is a speaker trained system that can handle a connected string of words (typically digits) [8]. In the laboratory, equally impressive advances have been recorded for speech recognition. A wide range of systems based on isolated words (both speaker-trained, and speakerindependent) have been developed for use over dialed-up telephone lines [12] -[16] ; andmorerecently speaker-independent, connected-word systems have been proposed for connected digit recognition [8] , [ 171 [ 191 . As the capabilities of the word recognizers have improved, the tasks to which they have been applied have become more sophisticated, and more difficult. Such tasks have included chess playing, data retrieval and management, airlines information and reservations, and automatic recognition of read text extracted from patents on lasers [20] -[23]. Much of this advanced research has been under the auspices of ARPA, and The authors are with BeU Laboratories, Murray Hill, NJ 07974. Manuscript received October 17, 1980; revised January 8, 1981. excellent summaries of this work are available in [4] and This paper is intended to be a tutorial in the concepts and theories underlying modern speech-recognition systems, both practical and experimental. Two aspects of the subject are given special attention. First, we treat speech recognition as a classical problem in pattern recognition and show how some fundamental ideas from signal processing, information theory, and computer science can be utilized to provide the capability of robust recognition of isolated words and simple connectedword sequences. We then describe methods whereby these principles, augmented by modern theories of formal language and semantic analysis, can be combined to study some of the more general problems of speech recognition. In particular, we show how these theories can be applied to improve the recognition accuracy of a nonideal acoustic pattern recognizer. It is anticipated that these investigations will ultimately afford accurate mechanical recognition of fluent speech provided that it is part of a “task-oriented” dialog, that is, it is restricted to pertain to a well-defined, carefully cjrcumscribed topic. The outline of this paper is as follows. We begin, in Section 11, with an overview of the pattern-recognition aspects of speech recognition. In this section, we are concerned with methods for short-time spectral estimation and elaborate on the filter bank and linear prediction methods. We also discuss other aspects of the pattern-recognition paradigm including similarity measures, temporal alignment of speech patterns, and statistical decision strategies. We then describe, within this framework, the basic word-recognition system, giving some details of its implementation, operation, and performance. Section 111 provides a discussion of the application of pattern-recognition techniques to the construction of speechrecognition systems designed to perform specific tasks. In these examples, the special requirements of each task influence the way the general theories are utilized and adapted. A relatively straightforward application of the basic principles allows us to build a voice-operated telephone repertory dialer. This particular system exploits some rudimentary task constraints by partitioning the vocabulary into sets of which only one is appropriate to any of the specific types of commands to which the system can respond. A more sophisticated kind of constraint is used to build a telephone directory listing retrieval system. Here the constraints implicit in a telephone directory are used to actually correct acoustic-recognition errors. In this context, we intro~ 2 4 1 . 0090-6778/81/0500-0621$00.75
منابع مشابه
Fuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition
In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملAn embedded word training procedure for connected digit recognition
The "conventional" way of obtaining word reference patterns for connected word recognition systems is to use isolatàd word patterns, and to rely on the dynamics of the matching algorithm to account for the differences in connected speech. Connected word recognition, based on such an approach, tends to become unreliable (high error rates) when the talking rate becomes grossly incommensurate with...
متن کاملOn the Application of Embedded Digit Training to Speaker Independent Connected Digit Recognition
In recent years, several algorithms have been proposed for recognizing a string of connected words (typically digits) by optimally piecing together reference patterns corresponding to the words in the string. Although the algorithms differ greatly in details of implementation, storage requirements, etc., they all have essentially the same performance in that their ability to match the unknown s...
متن کاملDesign and implementation of a speech server for unix based multimedia applications
In this paper we describe a general purpose speech recognition server (SRS) that provides a standard interface between applications and speech recognition modules. The recognition modules cover di erent techniques such as speaker dependent or independent, isolated or connected word recognition. The SRS is designed mainly for multimedia applications running on a network of UNIX workstations. Our...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1981